Skip to content

fix(trace): make empty/aborted runs fail loudly instead of silently "running"#7

Merged
burrows99 merged 1 commit into
fix/collector-emit-bounded-memoryfrom
fix/trace-runs-fail-loudly
Jun 19, 2026
Merged

fix(trace): make empty/aborted runs fail loudly instead of silently "running"#7
burrows99 merged 1 commit into
fix/collector-emit-bounded-memoryfrom
fix/trace-runs-fail-loudly

Conversation

@burrows99

Copy link
Copy Markdown
Owner

Why

From the dashboard screenshot in the last session: a run.chrome session pinned at the RUNNING badge with 0 events, no video, no lineage — and the agent never flagged it, because the CLI told it the run succeeded. Every signal that would say "this trace is broken or incomplete" was a non-blocking warning, a stderr log, or dashboard-only state — never in the --json envelope the agent reads.

This is also a violation of the logger's own contract (src/shared/codes.ts): the stderr log and the envelope diagnostic for one event are supposed to carry the same code, so you can join "what I saw while it ran" to "what landed in the result." In practice the recording/upload failures were log-only and invisible to the result.

Changes

1. Terminal envelope on abortDynamicCommand

  • A throw mid-run (attach failed, engine crashed, recording threw) left the initial running partial orphaned in the dashboard forever. The run now emits a terminal envelope via onProgress — no running flag, ok:false, an ENGINE_FATAL diagnostic — so the collector resolves the session. Cli flushes that emit before exiting non-zero.

2. Recording outcomes are envelope diagnostics, not just stderrDynamicCommand.#record

  • RECORD_EMPTY (no frames → empty video) and RECORD (render/upload threw) were log.warn/log.error only, so "no video" was invisible to a --json reader. Both now also push a warn diagnostic with the same code as the log.

3. Upload failure no longer masquerades as a clean local saveDynamicCommand.#record

  • S3ArtifactStore.upload swallows failures and returns null, which #record couldn't tell apart from "no S3 configured" — both became a tidy "saved locally". It now distinguishes them and emits an UPLOAD diagnostic when a configured upload fails (link missing, local copy kept).

4. ENGINE_FATAL logged as well as diagnosed — so it appears in both channels per the contract.

The logging audit that motivated this (the "any violations?" answer)

Cross-referencing every log.* and Diagnostic.* call site against the codes.ts "same code in both channels" contract:

Class Codes Status
Log-only failures (invisible to the envelope) RECORD, RECORD_EMPTY, UPLOAD fixed here — now in both channels
Log-only, but surfaces elsewhere CHROME (→ now ENGINE_FATAL via abort path), LSP (→ CODEGRAPH_FAILED downstream) acceptable
In both channels (the contract, done right) CODEGRAPH_FAILED, EMIT, ENGINE_FATAL (now)
Diagnostic-only (no stderr trail) STEP_FAILED, BP_UNBOUND, BP_BOUND_UNHIT, GRAPH_TRUNCATED, COMPLEXITY_HIGH, DEPS_CIRCULAR, TOOL_MISSING acceptable — the envelope IS the agent's channel; documented
Intentional level split EMIT (log error, diagnostic warn) by design: the operation failed, the result is still valid

Also checked and clean: stdout discipline — the logger is stderr-only (logger.ts), and every process.stdout.write in src/ is a legitimate data/human channel write, so --json is never polluted.

Not fixed here (bigger / out of scope, tracked as recommendations): dashboard "age out stale running" UI net, and decoupling the final "done" emit from the S3 upload so a slow upload can't delay the running→done transition.

Tests

  • test/dynamic-diagnostics.test.js (new) — fake-tracer unit tests: thrown run → terminal envelope (running cleared, ENGINE_FATAL); captured fatal → ok:false; clean-empty node trace → ok:true, not running.
  • tsc --noEmit clean · npm test44/45 pass (1 DB round-trip skipped).

Base

Stacked on fix/collector-emit-bounded-memory (PR #6) — its Cli.ts edit is adjacent to this one's. Retarget to master once #6 lands.

🤖 Generated with Claude Code

…running"

A chrome run that captured nothing and recorded nothing was reported as
success (ok:true, exit 0) and could sit on the dashboard's "running" badge
forever — the agent had no signal it was broken. Root cause: the failure
signals lived in stderr logs or dashboard-only state, never in the JSON
envelope the agent reads. This closes those gaps and the matching violations
of the logger's own two-channel contract (codes.ts: the stderr log and the
envelope diagnostic for one event should carry the SAME code).

- Terminal envelope on abort. If a run throws (attach failed, engine crashed,
  recording threw), DynamicCommand now emits a terminal envelope via onProgress
  — no `running` flag, ok:false, an ENGINE_FATAL diagnostic — so the collector
  resolves the session instead of leaving its initial "running" partial
  orphaned. Cli flushes that emit before exiting non-zero.

- Recording outcomes are now envelope diagnostics, not just stderr. RECORD_EMPTY
  (no frames → empty video) and RECORD (render/upload threw) were log-only, so
  "no video" was invisible to a --json reader. Both now push a warn diagnostic
  carrying the same code as the log.

- Upload failure no longer masquerades as a clean local save. S3ArtifactStore
  swallows failures and returns null, which #record could not tell apart from
  "no S3 configured". It now distinguishes the two and emits an UPLOAD
  diagnostic when a configured upload fails (link missing, local copy kept).

- ENGINE_FATAL is now logged as well as diagnosed, so it appears in both
  channels per the contract.

Adds test/dynamic-diagnostics.test.js (fake-tracer unit tests for the abort
terminal envelope, captured-fatal, and clean-empty cases). typecheck + build
clean; 44/45 tests pass (1 DB round-trip skipped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@burrows99 burrows99 merged commit 45fc8da into fix/collector-emit-bounded-memory Jun 19, 2026
1 check passed
@burrows99 burrows99 deleted the fix/trace-runs-fail-loudly branch June 19, 2026 07:30
burrows99 added a commit that referenced this pull request Jun 19, 2026
…ccurately (#6)

* fix(collector): bound emit failure memory and word network failures accurately

Follow-up to the emit-feedback diagnostics (#5), addressing two Copilot
review notes:

- Cli.ts kept every failed EmitResult in an array but only ever read the
  count and the last one. On a run that emits per onProgress, repeated
  failures grew that array unbounded. Replace it with a counter plus the
  most recent failure.
- The emit diagnostic said the collector "rejected" the emits even when
  the POST never reached it (connection refused/timeout/DNS — no HTTP
  status). Word an HTTP rejection and a delivery failure distinctly.
- Collector.emit returned the full rejection body, which a caller retains;
  a large error page could bloat memory. Cap the stored body at 10k chars
  (the log line still truncates to 500 independently).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(trace): make empty/aborted runs fail loudly instead of silently "running" (#7)

A chrome run that captured nothing and recorded nothing was reported as
success (ok:true, exit 0) and could sit on the dashboard's "running" badge
forever — the agent had no signal it was broken. Root cause: the failure
signals lived in stderr logs or dashboard-only state, never in the JSON
envelope the agent reads. This closes those gaps and the matching violations
of the logger's own two-channel contract (codes.ts: the stderr log and the
envelope diagnostic for one event should carry the SAME code).

- Terminal envelope on abort. If a run throws (attach failed, engine crashed,
  recording threw), DynamicCommand now emits a terminal envelope via onProgress
  — no `running` flag, ok:false, an ENGINE_FATAL diagnostic — so the collector
  resolves the session instead of leaving its initial "running" partial
  orphaned. Cli flushes that emit before exiting non-zero.

- Recording outcomes are now envelope diagnostics, not just stderr. RECORD_EMPTY
  (no frames → empty video) and RECORD (render/upload threw) were log-only, so
  "no video" was invisible to a --json reader. Both now push a warn diagnostic
  carrying the same code as the log.

- Upload failure no longer masquerades as a clean local save. S3ArtifactStore
  swallows failures and returns null, which #record could not tell apart from
  "no S3 configured". It now distinguishes the two and emits an UPLOAD
  diagnostic when a configured upload fails (link missing, local copy kept).

- ENGINE_FATAL is now logged as well as diagnosed, so it appears in both
  channels per the contract.

Adds test/dynamic-diagnostics.test.js (fake-tracer unit tests for the abort
terminal envelope, captured-fatal, and clean-empty cases). typecheck + build
clean; 44/45 tests pass (1 DB round-trip skipped).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(collector): bound the rejection body at the stream, add emit-diagnostic tests

Addresses the two Copilot review notes on the PR:

- Collector.emit buffered the whole rejection body via response.text()
  before slicing, so an oversized error page still caused a transient
  memory spike and download latency even though the stored body was
  capped. Read only up to MAX_BODY_CHARS off the stream, then cancel it —
  bounding memory and latency at the source.
- Extract the emit-failure diagnostic wording into an exported
  emitFailureMessage() helper and cover it with focused tests: HTTP status
  → "rejected", no status (network) → "failed" (the regression guard so a
  POST that never landed isn't reported as a rejection), plus the
  no-body / missing-error / oversized-body edge cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(collector): flush the TextDecoder so a split multi-byte char isn't dropped

readCappedText decoded chunks with { stream: true } but never did a final
decode() flush. A response ending on a multi-byte UTF-8 sequence split
across the last chunk boundary would drop/garble that trailing character in
the stored body and logged reason. Flush after the read loop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
burrows99 added a commit that referenced this pull request Jun 19, 2026
…#8)

Keep the prose docs in step with the behavior changes from the collector/trace
fixes (#5#7).

- README: the docker-compose section showed the container ports (:4747 ·
  :5432 · :9000) instead of the host-published ports the compose file actually
  maps. Correct to :14747 · :65432 · :19000 (console :19001), and the
  S3_ENDPOINT example to :19000, so the README matches docker-compose.yml. The
  native-serve example keeps :5432 (your own Postgres) — that one was right.

- skills/trace/SKILL.md: add a "Reading the result" section. The skill is
  deliberately minimal (the binary is the source of truth), but it gave zero
  guidance on interpreting a trace — the gap behind an agent treating an empty
  or perpetually-"running" trace as success. Names the three stable fields that
  decide whether a run did what was asked (ok · diagnostics[].code · meta.running)
  and the rule that ok:true + events:[] + a warn diagnostic means "ran fine, saw
  nothing — re-aim," not success. Field-level (no drift-prone code list).
burrows99 added a commit that referenced this pull request Jun 19, 2026
The DX-gaps analysis enumerated six ways the CLI misled an agent; all six
have since shipped. Keep the analysis verbatim as the rationale, but add a
status banner + table and a per-gap resolution note pointing to where each
was closed (and flagging the few sub-items still open: dashboard age-out of
stale "running" (#6b), done/upload decoupling (#6c), and lockfile-based port
auto-discovery (#5)).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant